Add SKYRL_LOG_LEVEL env var and suppress verbose Ray logs by default#947
Add SKYRL_LOG_LEVEL env var and suppress verbose Ray logs by default#947tyler-griggs wants to merge 2 commits intomainfrom
Conversation
- Add SKYRL_LOG_LEVEL to env_vars.py for centralized log level control - Update configure_ray_worker_logging to use SKYRL_LOG_LEVEL - Add log_to_driver=False to suppress worker/raylet log forwarding - Set RAY_BACKEND_LOG_LEVEL=fatal to suppress C++ metrics errors - Enable verbose logging when SKYRL_LOG_LEVEL=DEBUG Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new environment variable SKYRL_LOG_LEVEL to centralize control over logging levels and suppresses verbose Ray logs by default. This is a good change that improves the signal-to-noise ratio of training logs. The implementation is clean and follows the existing coding patterns. I've identified one potential issue regarding the lack of validation for the new environment variable, which could cause the application to crash if an invalid value is provided. I've included a suggestion to make the implementation more robust.
| # Logging | ||
| # ───────────────────────────────────────────────────────────────────────────── | ||
|
|
||
| SKYRL_LOG_LEVEL = os.environ.get("SKYRL_LOG_LEVEL", "INFO").upper() |
There was a problem hiding this comment.
The SKYRL_LOG_LEVEL is read from the environment but not validated. If a user provides an invalid log level (e.g., SKYRL_LOG_LEVEL=TYPO), the application will crash later when loguru tries to use it in configure_ray_worker_logging. It's better to validate the input against standard logging levels and fall back to a default value with a warning to prevent the application from crashing.
| SKYRL_LOG_LEVEL = os.environ.get("SKYRL_LOG_LEVEL", "INFO").upper() | |
| import logging | |
| _log_level_input = os.environ.get("SKYRL_LOG_LEVEL", "INFO").upper() | |
| if _log_level_input not in logging._nameToLevel: | |
| # Using print because logger is not yet configured. | |
| print( | |
| f"WARNING: Invalid SKYRL_LOG_LEVEL '{_log_level_input}'. " | |
| f"Valid options are {list(logging._nameToLevel.keys())}. Defaulting to 'INFO'." | |
| ) | |
| _log_level_input = "INFO" | |
| SKYRL_LOG_LEVEL = _log_level_input |
- Add SKYRL_LOG_DIR env var for log file location (default: /tmp/skyrl-logs) - Create new logging module (skyrl_train/utils/logging.py) with: - setup_logging(): Routes all logs to file, training progress to stdout - configure_worker_logging(): Simplified worker logging setup - Training progress loggers (trainer, fully_async_trainer, tracking, evaluate) go to stdout by default - Infrastructure loggers (vllm, ray, workers, inference_engines, etc.) only go to log file by default - Set SKYRL_LOG_LEVEL=DEBUG to show all logs on stdout - Update main_base.py to call setup_logging at startup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
SKYRL_LOG_LEVELenvironment variable toenv_vars.pyfor centralized log level controllog_to_driver=FalseRAY_BACKEND_LOG_LEVEL=fatalSKYRL_LOG_LEVEL=DEBUGThis improves the signal-to-noise ratio of training logs by hiding infrastructure logs and only showing training progress.
Test plan
SKYRL_LOG_LEVEL=DEBUG- verbose Ray logs should appear🤖 Generated with Claude Code